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Abstract: Let {Y, {Xi)i<i<p) be a real zero mean Gaussian vector and F be a subset of {1, . . . 
Suppose we are given n i.i.d. replications of this vector. We propose a new test for testing that 
Y is independent of (-'fi)ie{i,...p}\y conditionally to (Xi)igv against the general alternative that it 
is not. This procedure does not depend on any prior information on the covariance of X or the 
variance of Y and applies in a high-dimensional setting. It straightforwardly extends to test the 
neighbourhood of a Gaussian graphical model. The procedure is based on a model of Gaussian 
regression with random Gaussian covariates. We give non asymptotic properties of the test and 
we prove that it is rate optimal (up to a possible log(n) factor) over various classes of alternatives 
under some additional assumptions. Besides, it allows us to derive non asymptotic minimax rates of 
testing in this random design setting. Finally, we carry out a simulation study in order to evaluate 
the performance of our procedure. 

Key-words: Linear regression, Gaussian graphical models, multiple testing, adaptive testing, 
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Tests d'hypotheses lineaires pour des modeles de regression 
gaussienne en grande dimension 

Resume : On considerc (Y, {Xi)iQx) un vcctcur gaussicn dc moycnnc nullc ct V un sous-ensemble 
de T. Supposons que Ton observe n replications indopendantes du vecteur {Y, X). Dans ce rapport, 
nous proposons une procedure pour tester I'liypotliese Y est independant de (-'^i)iex\y conditio- 
nellement a (Xj)jgy. Ce test ne necessite aucune connaissance sur la matrice de covariance de 
X ou la variance de Y et peut s'appliquer dans un contexte de grande dimension. De plus, on 
peut facilcment en dcduire un test de voisinagc dc modclc graphique gaussien. Nous calculous la 
puissance du test d'un point de vue non asymptotique et prouvons qu'il atteint la vitesse optimale 
de separation (a un facteur logn pres) pour differentes classes d'alternatives. Nous en deduisons 
ainsi des vitesse minimax de separation d'hypotheses pour ce modele de regression. Enfin, nous 
evaluons la performance de notre procedure sur des donnees simulees. 

Mots-cles : Regression lineaire, modeles graphiques gaussiens, test multiple, test adaptatif, test 
minimax, vitesse de separation minimax, ellipsoi'de, test d'adequation 
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1 Introduction 

We consider the following regression model 

p 

Y^Y. e,X, + e (1) 

where 9 is an unknown vector of M^. In the sequel, we note X := {l,...,p}. The vector 
X := (Xi)i<i<p follows a real zero mean Gaussian distribution with non singular covariance matrix 
S and e is a real zero mean Gaussian random variable independent of X. Straightforwardly, the 
variance of e corresponds to the conditional variance of Y given X, Yar(Y\X). 



The variable selection problem for this model in a high-dimensional setting has recently attracted 
a lot of attention. A large number of papers are now devoted to the design of new algorithms and es- 
timators which are computat ionally feasible and are proven to con verge; see for inst ance the works of 
Meins hausen and Bii hlmann Meinshausen and Biihlmann ( 2006), Candes and Tag Candes and Tad 
()2Q07t ). Zhao and Y u'Zhao and Yul ll2006h . Zou and Hastie'Zou and Hastie ('2005'), Biihlmann and 
Kalisch Biihlmann and Kalisch 

(200^7 

or Zhao and Huang Zhang and Huang (2008). A common 
drawback of the previously mentioned estimation procedures is that they require restrictive condi- 
tions on the covariance matrix E in order to behave well. Our issue is the natural testing counterpart 
of this variable selection problem: we aim at defining a computationally feasible testing procedure 
that achieves an optimal rate for any covariance matrix S. 



1.1 Presentation of the main results 

We are given n i.i.d. replications of the vector {Y,X). Let us respectively note Y and the 
vectors of the n observations of Y and Xi for any i <e2. Let ^ be a subset of X, then Xy refers to 
the set {Xi,i G V} and 9v stands for the sequence {9i)i^v We first propose a collection of testing 



procedures Ta of the null hypothesis ^^9x\v — 0" a gainst the general alt ernative ^^9j\y ^ 0". These 
procedures are based on the ideas of Baraud et a/. iBaraud et al. 1 20031 ) in a random design. Their 



definition are very fiexible as they require no prior knowledge of the covariance of X, the variance 
of e, nor the variance of Y . Note that the property "9x\v — 0" is equivalent to "F is independent 
of Xx\v conditionally to Xy". Hence, it also permits to test conditional independences and applies 
for testing the graph of Gaussian graphical model (see below) . Contrary to most approaches in this 
setting (e.g. Drton and Pearlman Drton and PerlmanI ( 20071 )). we are able to consider the difficult 



case of tests in a high-dimensional setting: the number of covariates p is possibly much larger than 
the number of observations n. Such situations arise in many statistical appHc ations Hke in genomics 
or bio medical imaging. To our knowledge, the only testing procedures (e.g. ISchafer and Strimmer 



( 20051 )) that could handle high-dimensional alternatives lack of theoretical justifications. In this 



paper, we exhibit some tests Tq. that are both computationally amenable and optimal in the minimax 
sense. 

Prom a theoretical perspective, we are able to control the Family Wise Error Rate (PWER) of 
our testing procedures Tq. Besides, we derive a general non asymptotic upper bound for their 
power. Contrary to the va r ious r ate s of convergence obtain ed in the estimation setting (e.g. 
Meinshausen and Biihlmann or ICandes and T^ (l2007l )l. our upper bound holds for any 



covariance matrix S. Then, we derive from it non-asymptotic minimax rates of testing in the 
Gaussian random design framework. If the minimax rates are known for a long time in the fixed 
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design Gaussian regression framework (e.g BaraudI 1 2002^ ). they were unknown in our setting. For 
instance, if at most k components of 9 are non-zero and if k is much smaller than p, we prove that 



the minimax rates of testing is of order 



fc log(p) 



when the covariates Xi are independent. If the 



covariates are dependent, we derive faster minimax rates. To our knowledge, these are the first 
results for testing or estimation issues that illustrate minimax rates for dependent covariates. Af- 
terwards, we show analogous results when k is large, or when the vector 9 belongs to some ellipsoid 
or some collection of ellipsoids. For any of these alternatives, we exhibit some procedure Tq that 
achieves the optimal rate (at a possible log(n) factor). Finally, we illustrate the performance of the 
procedure on simulated examples. 



1.2 Application to Gaussian Graphical Models (GGM) 

Our work was originally motivated by the following question: let {Zj)j(zj be a random vector which 
follows a zero mean Gaussian distribution whose covariance matrix S' is non singular. We observe 
n i.i.d. replications of this vector Z and we are given a graph Q = (F, E) where F = {1, . . . \ J\] and 
-E is a set of edges in F x F. How can we test that Z is an undirected Gaussian graphical model 
(GGM) with respect to the graph Ql 

The random vector Z is a GGM with respect to the graph Q = (F, i?) if for any couple 
which is not c ontained in the e dge set E, Zi and Zj are independent, given the remaining variables. 



See Lauritzen LauritzenI ( 1996h for definitions and main properties of GGM. Interest in these models 
has grown as they allow the description o f dependence structure in high-dira ensional data. As such, 
they are widely used in spati al statistics (Cressie, 1993t Rue and Heidi . l2nn,'T[ ) or probabilistic expert 
systems I Cowell et al. . 19991 ). More recently, they have been applied to the analysis of microarray 
data. The challenge is to infer the network regulating t he expression of the genes usin g only a small 
sample of data, see for instance Sch afer and Strimmer Schafer and Strimmer 1 20051 ). Kishino and 
Waddell [Kishino and Waddelll ()2000[ ). or Wille et al. IWille et all l|2004[ ). This issue has motivated 
the research for new estimation procedures to handle GGM in a high-dimensional setting. 

It is beyond the scope of this paper to give an exhaustive review of these. Many of these graph 
estimation methods are base d on multiple testing pro cedures, see for instance Scha fer and Strimmer 
Schafer and Strimmer 1 2005l l or Wille and Biihlmann Wille and Biihlmannl 1 2006t ). Other methods 
are based on variable selecti on for high-dimensional data we pr eviously mentioned. For instance, 
Meinshausen and Biihlmann Meinshausen and Biihlmannl 1 20061 ) pro posed a computatio nally feasi- 
ble mod el selection algorithm using Lasso penalization. Huang et al. iHuang et al. 1 20061 ) and Yuan 
and Lin lYuan and Lin ( 2007 ) extend this method to infer directly the inverse covariance matrix 
by minimizing the log-likehood penalized by the norm. 
While the issue of graph and covariance estimation is extensively studied, few theoretical results 
are proved for the problem of hypothesis testing of GGM in a high-dimensional setting. We believe 
that this issue is significant for two reasons: first, when considering a gene regulation network, the 
biologists often have a previous knowledge of the graph and may want to test if the microarray data 
match with their model. Second, when applying an estimation method in a high-dimensional setting, 
it could be useful to test the estimated graph as some of these methods reveal too conservative. 

Admittedly, some of the previously mentioned estimation methods are based on multiple testing. 
However, as they are constructed for an estimation purpose, most of them do not take into account 
some previou s knowledge about the grap h. This is for instance the c a.se for the approaches o f Drton 
and Perlman Drton and Perlm"anl 1 2007 ) and Schafer and Strimmer Schafer and Strimmer ( 2005 ). 
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Some of the other existing procedures cannot be applied in a high-dimensional setting > n). 
Finally, most of them lack t heoretical justification in a non asymptotic way. 

In a subsequent paper I Verzelen and Villerd . 2007 ) we define a test of graph based on the 
present work. It benefits the ability of handling high dir nensional GGM and has min imax properties. 
Besides we show numerical evidence of its efficiency; see lVerzelen and VillersI (|2007l ) for more details. 
In this article, we shall only present the idea underlying our approach. 

For any j G J , we note the set of neighbours of j in the graph Q. Testing that Z is 

a GGM with respect to Q is equivalent to testing that the random variable Z^ conditionally to 
(Zi);gjv(j) is independent of (^/)ie J\(iV0)u{i}) for any j G J . As Z follows a Gaussian distribution, 
the distribution of Zj conditionally to the other variables decomposes as follows: 



Zj - 2^ 6'fcZfc + Cj, 



where tj is normal and independent of (Zfc)fcej\{j}- Then, the statement of conditional indepen- 
dency is equivalent to Q j\{j'\\jN (j) — 0- This a pproach based on conditional regres sion is also used 
for estimation by Meinshausen and Biihlmann Meinshausen and Biihlmam] 1 20061 ). 



1.3 Organization of the Paper 

In Section [21 we present the approach of our procedure and connect it with the fixed design frame- 
work. Besides, we define the notion of minimax rates of testing in this setting and gather the main 
notations. We define the testing procedures in Section [3] and we non asymptotically characterise 
the set of vectors B over which the test is powerful. In Section |4] and [3 we apply our procedure 
to define tests and study their optimality for two different classes of alternatives. More precisely, 
in Section m we test ^ = against the class of d whose components equal 0, except at most fc of 
them (fc is supposed small). We define a test which under mild conditions achieves the minimax 
rate of testing. When the covariates are independent, it is interesting to note that the minimax 
rates exhibits the same ranges in our statistical model H]) and in fixed design regression model ^ . 
In Section [H we define two procedures that achieve the simultaneous minimax rates of testing over 
large classes of ellipsoids (to sometimes the price of a log(p) factor). Besides, we show that the 
problem of adaptation ov er classes of ellips oids is impossible without a loss in efficiency. This was 
previously pointed out in ISpokoinvl 1 199a) fixed design regression framework. The simulation 



studies are presented in Section [H Finally, Sections [71 [8] and Appendix contain the proofs. 



2 Description of the approach 

2.1 Connection with tests in fixed design regression 

Our work is directly inspired by the testing procedure of Baraud et al. Baraud et al. 1 2003l l ir 



fixed design regression framework. Contrary to model llj, the problem of hypothesis testing in 
fixed design regression has been extensively studied. This is why we will use the results in this 
framework as a benchmark for the theoretical bounds in our model H]). Let us define this second 
regression model: 

Y^ = !^ + at,, ze {l,...,iV}, (2) 
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where / is an unknown vector of R , a some unknown positive number, and the e^'s a sequence 
of i.i.d. standard Gaussian random variables. The problem at hand is tes ting that / be l ongs to a 
linear subspace of against the alternative that it does not. We refer to Baraud et al. ( 20031 ) for 
a short review of non parametric tests in this framework. Besides, we are interested in the perfor- 
mance of the procedures from a minimax perspective. To our knowledge, there has been no results 
in model ([1]). However, there are numerous pap ers on this issue in the fixed design regression model. 
First, we refer to the seminal work of Ingster I Ingsterl . Il993al [bl 3) who gives asymptotic r ainimax 
rates over non parametric alternatives. Our work is closely related to the results of Baraud BaraudI 
(12002) where he gives non asymptotic minimax rates of testing over ellipsoids or sparse signals. 
Throughout the paper, we highhght the link between the minimax rates in fixed and in random 
design. 



2.2 Principle of our testing procedure 

Let us briefiy describe the idea underlying our testing procedure. A formal definition will follow in 
Section [3?n Let m be a subset of X\ V . We respectively define Sy and Svum as the linear subspaces 
of RP such that 9j:\v = 0, respectively 9x\(vvm) = 0. We note d and Dm for the cardinalities of V 
and m and Nm refers to N„i = n — D„i. If Nm > 0, we define the Fisher statistic (/)„i by 

'^™(^'^)-= A„||Y-n,.u™Y||^ ' 

where Hy refers to the orthogonal projection onto the space generated by the vectors (Xi)igy and 
||.||„ is the canonical norm in R". We define the test statistic ^^^^(Y, X) as 

0„^„(Y,X) = <^™(Y,X) - F^lj^Ja) , (4) 

where Fu^^jsf^^u) denotes the probability for a Fisher variable with D and N degrees of freedom 
to be larger than u. Let us consider a finite collection ^A of non empty subsets of2\V such that 
for each m € Ai, > 0. Our testing procedure consists of doing a Fisher test for each m £ M.. 
We define {am,m € M} a. suitable collection of numbers in ]0, 1[ (which possibly depends on X). 
For each to e Al , we do the Fisher test 4>m of level a™ of: 

Ha : 9 e Sv against the alternative Hi^m ■ 9 G Svum \ Sv 

and we decide to reject the null hypothesis if one of those Fisher tests does. 



The main advantage of our procedure is that it is very fiexible in the choices of the model 
m £ M and in the choices of the weights {am}- Consequently, if we choose a suitable collection 
M , the test is powerful over a large class of alternatives as shown in Sections 13.31 [H and [H 

Finally, let us mention that our procedure easily extends to the case where the expectation of 
the random vector (F, X) is unknown. Let X and Y denote the projections of X and Y onto the 
unit vector 1. Then, one only has to apply the procedure to (Y — Y,X — X) and to replace d by 
d+ 1. The properties of the test remain unchanged and one can adapt all the proofs to the price 
of more technicalities. 
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2.3 Minimax rates of testing 

In order to examine the quality of our tests, we will compare their performance with the minimax 
rates of testing. That is why we now define precisely what we mean by the {a, (S)-minimax rate of 
testing over a set 9. We endow MJ' with the EucHdean norm 

As e and X are independent, we derive from the definition of that var(y) = ||6'|p+var(F|X). 
Let us remark that var(y|X) does not depend on X. If we have \\9\\ vary, either the quantity var(F) 
or var(y|X) has to vary. In the sequel, we suppose that var(y) is fixed. We briefiy justify this 
choice in Section [421 Consequently, if ||6'|p is increasing, then var(y|X) has to decrease so that 
the sum remains constant. Let a be a number in ]0; 1[ and let <5 be a number in ]0; 1 — a[ (typically 
small). For a given vector 6, matrix S and var(y), we denote Pg the joint distribution of (Y,X). 
For the sake of simpHcity, we do not emphasize the dependence of Pe on var(F) or S. Let i/'a be a 
test of level a of the hypothesis "6* = 0" against the hypothesis "0 € & \ 0". In our framework, it 
is natural to measure the performance of ipa using the quantity p (ipa, ©, <5, var(y), E) defined by: 



p(V'o,e,J,var(y),S) := inf p > 0, inf Pe(Va = 1), ^ S 6 and , J! '' > > I - 6 

[ [ var(r) - ||6'|12 J 

where the quantity 

appears naturally as it corresponds to the ratio ||6'|p/var(y|X) which is the quantity of information 
brought by X (i.e. the signal) over the conditional variance of Y (i.e. the noise). We aim at 
describing the quantity 

infp(V'a,e,5,var(r),E) ;= p (6, a, 5, var(F), E) , (7) 

where the infimum is taken over all the level-a tests ipa- We call this quantity the (a, i5)-minimax 
rate of testing over 6. 

A dual notion of this p function is the function /3s- For any 9 C and a g]0, 1[, we denote 
/3s (9) the quantity 

/3e(9) i^infsupPe [^a^O], 

where the infimum is taken over all level-a tests ipa and where we recall that E refers to the covari- 
ance matrix of X . 



2.4 Notations 

Let recall the main notations that we shall use throughout the paper. In the sequel, n stands for 
the number of independent observations, p is the number of covariates. Besides, Xy stands for the 
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collection {Xi)i^v of the covariates that correspond to the null hypothesis and d is the cardinality 
of the set V . The models m are subsets oiJ CV and we note Dm their cardinality. stands for 
our testing procedure of level a. The statistics 0m and the test (t>m.a are respectively defined in ^ 
and lUl). Finally, the norm ||.|| is introduced in[5l 
For y e K, we set 

X Ay := inf{x, y}, xV y := sup{a;, y}. 

For any u e M, Fd,n{u) denotes the probability for a Fisher variable with D and N degrees of 
freedom to be larger than u. In the sequel, L, Li, L2, - ■ ■ denote constants that may vary from Hue 
to line. The notation L(.) specifies the dependency on some quantities. For the sake of simplicity, 
we only give the orders of magnitude in the results and we refer to the proofs for explicit constants. 



3 The Testing procedure 
3.1 Description of the procedure 

Let us first fix some level a e]0, 1[. Throughout this paper, we suppose that n > d + 2. Let us 
consider a finite collection A4 of non empty subsets of X \ such that for all m G , 1 < Dm < 
n — d — 1. We introduce the following test of level a. We reject Hq: "6* G S'y" when the statistic 

r„ sup f 0,„(Y, X) - j^Jamm)} (8) 

is positive, where the collection of weights {a„i(X), m G Al} is chosen according to one of the two 
following procedures: 



Pi : The am 's do not depend on X and satisfy the equality : 

a,„ = a . (9) 

meM 

P2 : For all TO G Al, UmP^) = 9x,q, the a-quantile of the distribution of the random variable 

. r p ( \\nv^m{e)-nv(e)\\l/Dm \ 

mt Fd^.n,„ — 7, . ^||n (10) 

conditionally to X. 

Note that it is easy to compute the quantity qx,a- Let Z he a standard Gaussian random vector 
of size n independent of X. As e is independent of X, the distribution of (fTO]) conditionally to X 
is the same as the distribution of 



inf Fd„ 



conditionally to X. Hence, we can easily work out its quantile using Monte-Carlo method. 



|nyum(^)-ny(Z)||VA, 



Clearly, the computational complexity of the procedure is linear with respect to the size of the 
collection of models M even when using Proc edure P2. Consequent l y, wh en we apply our procedure 
to high-dimensional data as in Section [6] or in I Verzelen and Villerj ()2Q07l ). we favour collections M 
whose size is linear with respect to the number of covariates p. 
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3.2 Comparison of Procedures Pi and P2 

We respectively refer to and for the tests ^ associated with Procedure Pi and P2- First, 
we are able to control the behavior of the test under the null hypothesis. 

Proposition 1. The testT^ corresponds to a Bonferroni procedure and therefore satisfies 

am < a, 

whereas the test has the property to be of size exactly a: 

Pe{T^ > 0) = a. 

The proof is given in Appendix. Besides, the test is more powerful than the corresponding 
test defined with weights am — a/\Ai\. 

Proposition 2. For any parameter 9 that does not belong to Sy , the procedure with weights 
o-m = a/\M\ and the procedure satisfy 

Pe(T2(X,Y) >0|X) >Pe(T„i(X,Y) >0|X) X a.s. . (11) 

Again, the proof is given in Appendix. On the one hand, the choice of Procedure Pi allows to 
avoid the computation of the quantile gx,a and possibly permits to give a Bayesian fiavor to the 
choice of the weights. On the other hand. Procedure P2 is more powerful than the corresponding 
test with Procedure Pi . We will illustrate these considerations in Section [6l In sections 13.31 [H and 
[5] we study the power and rates of testing of Ta with Procedure Pi . 

3.3 Power of the Test 

We aim at describing a set of vectors 9 in W over which the test defined in Section[3]with Procedure 
Pi is powerful. Since Procedure P2 is more powerful than Procedure Pi with am — a/l-^L the test 
with Procedure P2 will also be powerful on this set of 9. 

Let a and S be two numbers in ]0, 1[, and let {am, "i e M} be weights such that J2meM — 
Let define Hypothesis (Hm) as follows: 

(Hm) For all m e M, am> exp(-iV„/10) and S > exp 2(-iV„i/21). 

For typical choices of the collections A4 and {ami'Ti g M}, these conditions are fulfilled as 
discussed in Sections [4] and [H Let us now turn to the main result. 

Theorem 3. Let Ta be the test procedure defined by We assume that n > d + 2 and that 

Assumption {Hm) holds. Then, P6/(Tq, > 0) > 1 — J for all 9 belonging to the set 

TMiS) := {9 eW,3meM: ^"'^^^'^^i ^^^'^^^^ > A(m) 
[ var(Y\Xvurn) 

where 



LiJomlog^^) (1 + ^) +12(1 + 2^) log 
A(m) — — -. (12) 
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This result is similar to Theorem 1 in lBaraud et al. ( 2003l l in fixed design regression framework 



and the same comment also holds: the test under procedure Pi has a power comparable to the 
best of the tests among the family {(j>m,a,'m G M}. Indeed, let us assume for instance that V = {0} 
and that the am are chosen to be equal to q;/|A4|. The test Ta defined by ^ is equivalent to doing 
several tests of 9 = against 9 e Sm at level am for m e and it rejects the null hypothesis if 
one of those tests does. From Theorem [3l we know that under the hypothesis Hm this test has a 
power greater than 1 — 6 over the set of vectors 9 belonging to UmeAi -^'mi^^ '^m) where T'm{6i am) 
is the set of vectors 9 gW such that 

=20^5=^ >^^^^^f,/W^+ log (^)V (13) 

var(r|X„) n \am5 J \am5 J J 

Besides, L{Dm, Nm) behaves like a constant if the ratio Dm/Nm is bounded. Let us compare this 
result with the set of 9 over which the Fisher test 4>m,a at level a has a power greater than 1 — 5. 
Applying Theorem^ we know that it contains ^mi^^ '^)- Moreover, the following Proposition shows 
that it is not much larger than J-^{S,a): 

Proposition 4. Let S e]0, 1 — a[. If 

var(r) - var(y 

< L(a,d) , 



var(r|X™) 
then Pe {(t)m,a > 0) < 1 - 6 

The proof is postponed to Section [8] and is based on a lower bound of the minimax rate of 
testing. 

^m{S, a) and ^mi^^ ^m) defined in ifTS]) differ from the fact that log(l/a) is replaced by log(l/am). 
For the main appHcations that we will study in Section IH [U andO the ratio log(l/am) /log(l/a) 
is of order log(n), loglogn, or fclog(ep/fc) where A: is a "small" integer. Thus, for each 5 e]0, 1 — a[, 
the test based on Ta has a power greater than 1 — S over a class of vectors which is close to 
[Jm£M •^m(<^i Q^)- It follows that for each 9 the power of this test under Vg is comparable to the 
best of the tests among the family {(f>m,a,'ni G M}. 

In the next two sections, we use this theorem to establish rates of testing against different types 
of alternatives. First, we give an upper bound for the rate of testing 9 = against a class of 
9 for which a lot of components are equal to 0. In Section [5l we study the rates of testing and 
simultaneous rates of testing 9 = against classes of ellipsoids. For the sake of simplicity, we will 
only consider the case V = {0}. Nevertheless, the procedure Ta defined in ([8]) applies in the same 
way when one considers more complex null hypothesis and the rates of testing are unchanged except 
that we have to replace nhy n — d and var(y) by var(F|Xy). 



4 Detecting non-zero coordinates 

Let us fix an integer k between 1 and p. In this section, we are interested in testing 9 = 
against the class of 9 with a most k non-zero components. This typically corresponds to the 
situation encountered when considering tests of neighborhood for large sparse graphs. As the 
graph is assumed to be sparse, only a small number of neighbors are missing under the alternative 
hypothesis. 
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For each pair of integers {k,p) with k < p, let A^(fc,p) be the class of all subsets of X = {1, ■ ■ ■ ,p} 
of cardinality k. The set 6[A:,p] stands for the subset of vectors 9 ^ Rp, such that at most k 
coordinates of are non-zero. 

First, we define a test Ta of the form ([8]) with Procedure Pi, and we derive an upper bound for 
the rate of testing of Ta against the alternative 9 G Q[k,p]. Then, we show that this procedure is 
rate optimal when all the covariates are independent. Finally, we study the optimality of the test 
when k = 1 for some examples of covariance matrix S. 



4.1 Rate of testing of Ta 

Proposition 5. We consider the set of models Ai = M{k,p). We use the test Ta under Procedure 
Pi and we take the weights am all equal to a/\M\. Let us suppose that n satisfies: 



n> L 



(14) 



Let us set the quantity 



rk,p,n 



L{a,5) 



(15) 



For any 9 in 8[fc,p], such that 



var(r)-||e||2 



> 



rk,p,n' 



(Ta > 0) > 1 - (5. 



We recall that the norm ||.|| is defined in ^ and equals var(y) — var(F|X). This proposition 
easily follows from Theorem [3] and its proof is given in Section [7l Note that the upper bound does 
not directly depend on the covariance matrix of the vector X. Besides, Hypothesis corresponds 
to the minimal assumption needed fo r consistency and type-oracle inequaliti es in th e estim ation 
setting as pointed out by Wainwright I Wainwright ( 2007 1 Th. 2) and Giraud ( GiraudI I 2008h Sect. 
3.1). Hence, we conjecture that Hypothesis is minimal so that Proposition [5] holds. We will 
further discuss the bound (fTSj) after deriving lower bounds for the minimax rate of testing. 



4.2 Minimax lower bounds for independent covariates 

In the statistical framework considered here, the problem of giving minimax rates of testing under 
no prior knowledge of the covariance of X and of var(y) is open. This is why we shall only derive 
lower bounds when var(F) and the covariance matrix of X are known. In this section, we give 
non asymptotic lower bounds for the (a, (5)-minimax rate of testing over the set 6[A:,p] when the 
covariance matrix of X is the identity matrix (except Proposition [6]) . As these bounds coincide 
with the upper bound obtained in Section |4!T| this will show that our test Ta is rate optimal. 

We first give a lower bound for the {a, (5)-minimax rate of detection of all p non-zero coordinates 
for any covariance matrix E. 

Proposition 6. Let us suppose that var(y) is known. Let us set Pp „ such that: 

pl^:^L{a,5)^. (16) 
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Then for all p < Pp^n, 



= p 



var(y) - \\0\\^ 

where we recall that T, is the covariance matrix of X . 

li n > (1 + j)p for some 7 > 0, Theorem [3] shows that the test 0x,q defined in |[4]) has power 
greater than 6 over the vectors that satisfy 



var(r) 



1^ > L{-f,a,S) 



Hence, ^yp/n is the minimax rate of testing 0[p,p] at least when the number of observations is 
larger than the number of covar i ates. T his is coherent with the minimax rate obtained in the fixed 
design framework (e.g. BaraudI 1 2002l ll. When p becomes larger we do not think that the lower 
bound given in Proposition [6] is still sharp. Note that this minimax rate of testing holds for any 
covariance matrix S contrary to Theorem [7l 

We now turn to the lower bound for the (a, (5)-minimax rate of testing against 9 € Q[k,p]. 

Theorem 7. Let us set pi p n such that 



2 

Pk^p^n 



L(a,<5)-log( 1 + ^ + ^2^ 



fc2 



(17) 



We suppose that the covariance of X is the identity matrix I. Then, for all p < pk. 



p,n J 



f3i [\0ee[k,p], 



var(r) - 



= P 



where the quantity var(y) is known. 
If a + d < 53%, then one has 



> S. 



Pk,p,n ^ ^ log 1 1 



fc2 



This result implies the following lower bound for the minimax rate of testing 

p(e[fc,p],a,(5,var(r),/)) > p^ p„. 

The proof is given in Section [HI To the price of more technicalities, it is possible to prove that 
the lower bound still holds if the variables {Xi) are independent with known variances possibly 
different. Theorem [7| recovers approximately t he lower bound s for the minimax rates of testing in 
signal detection framework obtained by Baraud fBaraudl 1 2002l l. The main difference lies in the fact 
that we suppose var(y) known which in the signal detection framework translates in the fact that 
we would know the quantity ||/||2 + (t2_ 

We are now in position to compare the results of Proposition [5] and Theorem [71 Let distinguish 
between the values of k. 
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When k < p'^ for some 7 < 1/2, if n is large enough to satisfy the assumption of Proposition^ 
the quantities ^ „ and p'^p ^ are both of the order times a constant (which depends 

on 7, a, and 6). This shows that the lower bound given in Theorem [7| is sharp. Additionally, 
in this case, the procedure Tq, defined in Proposition [5] follows approximately the minimax 
rate of testing. We recall that our procedure Ta does not depend on the knowledge of var(F) 
and corr(X). In apphcations, a small k typically corresponds to testing a Gaussian graphical 
model with respect to a graph G, when the number of nodes is large and the graph is supposed 
to be sparse. When n does not satisfy the assumption of Proposition [H we believe that our 
lower bound is not sharp anymore. 

When < k < p, the lower bound and the upper bound do not coincide anymore. Never- 
theless, if n > {l + j)p for some 7 > 0, Theorem [3] shows that the test (j>i^a defined in (0]) has 
power greater than 5 over the vectors 9 that satisfy 

>L{j,a,S)^. (18) 



var(r) - 116*112 - ' ' n 

This upper bound and the lower bound do not depend on k. Here again, the lower bound 
obtained in Theorem [7| is sharp and the test (fix, a defined previously is rate optimal. The fact 
that the rate of testing stabilizes around -^/p/n for k > y/p a lso appears in signal detection 



and there is a discussion of this phenomenon in iBaraudI ( 2002l l. 



• When k < ^ and k is close to y/p, the lower bound and the upper bound given by Proposition 
[5] differ from at most a log(p) factor. For instance, if k is of order ^/p/ logp, the lower bound 
in Theorem [7| is of order ^/p log log p/ log p and the upper bound is of order ^fp. We do not 
know if any of this bound is sharp and if the minimax rates of testing coincide when var(F) 
is fixed and when it is not fixed. 

All in all, the minimax rate s of testing exhibit the same range of rates in our framework as in 
signal detection ()Baraudl . liool when the covariates are independent. Moreover, this implies that 



the minimax rate of testing is slower when the (Xi)igi are independent than for any other form 
of dependence. Indeed, the upper bounds obtained in Proposition [5] and in (fTSl do not depend 
on the covariance of X. Then, a natural question arises: is the test statistic Ta rate optimal for 
other correlation of XI We will partially answer this question when testing against the alternative 
6'e e[l,p]. 

4.3 Minimax rates for dependent covariates 

In this section, we look for the minimax rate of testing = against d G 6[l,p] when the covariates 
Xi are no longer independent. We know that this rate is between the orders i, which is the minimax 

rate of testing when we know which coordinate is non-zero, and , the minimax rate of testing 
for independent covariates. 

Proposition 8. Let us suppose that there exists a positive number c such that for any i 7^ j, 

\corr{X^,Xj)\ < c 
and that a + S < 53%. We define Pi,p,n,c 

2 L , . , 1 



Pip,n,c ■■= - logb) A - . (19) 
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Then for any p < pi,p,„,c, 

where S refers to the covariance matrix of X . 

Remark: If the correlation between the covariates is smaller than l/log(p), then the minimax 
rate of testing is of the same order as in the independent case. If the correlation between the 
covariates is larger, we show in the following Proposition that under some additional assumption, 
the rate is faster. 

Proposition 9. Let us suppose that the correlation between Xi and Xj is exactly c > for any 
i j- Moreover, we assume that n satisfies the following condition: 

l + logf4)l (20) 



n> L 

Let introduce the random variable :— - Y^J'^-, , Lf a < 60% and 5 < 60% the test 

P ^'-i ^/var(Xi) ■' 

defined by 



aS / 
X 

yJvar{Xi) ' 

V 4>{p+l}.a/2 



sup (P{i}.a/{2p) 
1<«<P 

satisfies 

Po (T„ > 0) < a and ¥e (T^ > 0) > 1 - S, 
for any 9 in Q[l,p] such that 



var(F) 



^ Lia,6) 



logp/\i 



Consequently, when the correlation between Xi and Xj is a positive constant c, the minimax 
rate of testing is of order i2s(£)A(i/£) _ When the correlation coefficient c is small, the minimax rate 
of testing coincides with the independent case, and when c is larger those rates differ. Therefore, 
the test Ta defined in Proposition [5] is not rate optimal when the correlation is known and is large. 
Indeed, when the correlation between the covariates is large, the tests statistics (f>{m},am defining 
Ta are highly correlated. The choice of the weights a™ in Procedure Pi corresponds to a Bonferroni 
procedure, which is precisely known to behave bad when the tests are positively correlated. 

This example illustrates the limits of Procedure Pi. However, it is not very realistic to suppose 
that the covariates have a constant correlation, for instance when one considers a GGM. Indeed, 
we expect that the correlation between two covariates is large if they are neighbors in the graph 
and smaller if they are far (w.r.t. the graph distance). This is why we derive lower bounds of the 
rate of testing for other kind of correlation matrices often used to model stationary processes. 

Proposition 10. Let Xi, . . . ,Xp form a stationary process on the one dimensional torus. More 
precisely, the correlation between Xi and Xj is a function of\i — j\p where refers to the toroidal 
distance defined by: 

\i-3\p ■■= {\i-j\)^{p-\i-i\) ■ 
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and S2(i) respectively refer to the correlation matrix of X such that 

C0TT{Xi, Xj) — exp(— w|z — j\p) where w > , 
C0TT{Xi,Xj) = (1 + |« - j|p)~* where t> . 

Let us set pI p ^Si i"^) '^^'^ Pi p n S2 W such that: 



if t>l 

if t^l 
if 0<t<l 



> s. 

If the range ui is larger than l/p'^ or if the range t is larger than 7 for some 7 < 1, these lower 
bounds are of order As a consequence, for any of these correlation models the minimax rate of 
testing is of the same order as the minimax rate of testing for independent covariates. This means 
that our test Ta defined in Proposition [5] is rate-optimal for these correlations matrices. However, 
if uj is smaller than 1/p or if t is smaller than 1/ log(p), we recover the parametric rates 1/n, which 
is achieved by the test This comes from the fact that the correlation corr(Xi,Xi) does 

not converge to zero for such choices of uj or t. We omit the details since the arguments are similar 
to the proof of Proposition [9l 

To conclude, when k < (for 7 < 1/2), the test Ta defined in Proposition [5] is approximately 
(a, (5)-minimax against the alternative 9 e when neither var(y) nor the covariance matrix 

of X is fixed. Indeed, the rate of testing of Ta coincide (up to a constant) with the supremum of 
the minimax rates of testing on 6[A;,p] over all possible covariance matrices S: 

p(Q[k,p],a,5) := sup p (e[fc,p], a, (5, var(y), S) , 
var(y)>o,s>o 

where the supremum is taken over all positive var(y) and every positive definite matrix S. When 
k > y/p and when n > (1 + 7)^3 (for 7 > 0), the test defined in (fTSl has the same behavior. 

However, our procedure does not adapt to S: for some correlation matrices (as shown for 
instance in Proposition [9]) , Ta with Procedure Pi is not rate optimal. Nevertheless, we believe and 
this will be illustrated in Section [6] that Procedure P2 slightly improves the power of the test when 
the covariates are correlated. 



Pl,p,n,S2(0 • — 

Then, for any < p?,p,„,Si 



1-e- 



log[l + L{a,6)p 

llog(l + L(a,J) ,^,4(^_,) ) 
ilog(l + L(a,%*2-*(l-i)) 



var(r) - 116*11 



= P 



and for any p^ < pl^p^n,!:^^^) 



var(r) - \\e\\ 



= p 
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5 Rates of testing on "ellipsoids" and adaptation 

In this section, we define tests Tq of the form |(8]) in order to test simultaneously 6 — against 6 
belongs to some classes of eUipsoids. We will study their rates and show that they are optimal at 
sometimes the price of a logp factor. 

For any non increasing sequence (ai)i<i<p+i such that ai ~ 1 and ap^i — and any i? > 0, we 
define the ellipsoid Ea{R) by 



£a{R) 



var(r|X™,_J-var(y|X™.) 



1=1 



< R^Ya.v{Y\X) 



(21) 



where refers to the set {1, . . . , ?} and ttlq = 0. 

Let us explain why we call this set an ellipsoid. Assume for instance that the {Xi) are in- 
dependent identically distributed with variance one. In this case, the difference var(F|Xm,_i) — 
va,T{Y\X„i-) equals 16*^1^ and the definition of £a{R) translates in 



< i?\ar(y|X) 



i=l 



The main difference between this definition a nd the c l assica l definition of an ellipsoid in the fixed 
design regression framework (as for instance in BaraudI 1 200^ ) is the presence of the term vav{Y\X). 
We added this quantity in order to be able to derive lower bounds of the minimax rate. If the Xi 
are not i.i.d. with unit variance, it is always possible to create a sequence X'- of i.i.d. standard 
gaussian variables by orthogonalizing the Xi using Gram-Schmidt process. If we call 0' the vector 
in W such that Xe = X'O', it is straightforward to show that var(r|X„,_ J - var(y|X^J = |6'-p. 
We can then express Ea{R) using the coordinates of 6' as previously: 



£a{R) = I e MP, ^ < R^vsi{Y\X) 



The main advantage of Definition [2T] is that it does not directly depend on the covariance of X. In 
the sequel we also consider the special case of ellipsoids with polynomial decay. 



s'siR) ■■={0^ 



, var(y|X^,_ J - var(y|X„ 
i-2''var(r|X) 



(22) 



where s > and i? > 0. First, we define two tests procedures of the form ([8]) and evaluate their 
power respectively on the ellipsoids £a{R) and on the ellipsoids £'s(R). Then, we give some lower 
bounds for the (a, (S)-simultaneous minimax rates of testing. Extensions to more general Ip balls 
with < j P < 2 are possi ble to the price of more technicalities by adapting the results of Section 4 
in Baraud [BaraudI l|2002h . 

These alternatives correspond to the situation where we are given an order of relevance on the 
covariates that are not in the null hypothesis. This order could either be provided by a previous 
knowledge of the model or by a model select ion algorithm such as LARS (least angle regression) 



introduced by Efron et al. Efron et al. ll2004l). We apply th i s last method to build a collection of 



models for our testing procedure ^ in I Verzelen and VillersI (|20Q7h 
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5.1 Simultaneous Rates of testing of Ta over classes of ellipsoids 

First, we define a procedure of the form ^ in order to test 6 — against 9 belongs to any of the 
ellipsoids £a{R)- For any x > 0, [x] denotes the integer part of x. 
We choose the class of models M and the weights a™ as follows: 

• If n < 2p, we take the set A4 to be Ui<fc<[„/2]'™fe and all the weights am are equal to a/|A^|. 

• If n > 2p, we take the set A4 to be Ui<k<pmk- amp equals a/2 and for any k between 1 and 
p — I, amk is chosen to be a/ {2{p — 1)). 

As previously, we bound the power of the tests Ta from a non-asymptotic point of view. 

Proposition 11. Let us assume that 



n> L 



1 



(23) 



For any ellipsoid £a{R), the test T^ defined by (B) with Procedure Pi and with the class of models 
given just above satisfies 

Po (T„ < 0) > 1 - a, 
and Pe (T„ > 0) > 1 - (5 for all 9 € 8a{R) such that 



var(y) - ll^ll 



> L{a, (5) logn inf 

l<i<[n/2] 



^ n 



(24) 



if n < 2p, or 



var(r) - ll^ir 



> L{a,S) 



logp inf af+^R'' 

l<l<p— 1 \ 



A 



VP 



(25) 



if n > 2p. 

All in all, for large values of n, the rate of testing is of order sup]^<j<p 



■\/'lQg(p) 



show in the next subsection that the minimax rate of testing for an eUipsoid is of order: 



We 



sup 

l<i<p 



Besides, we prove in Proposition [16] that a loss in -y/Ioglogp is unavoidable if one considers the 
simultaneous minimax rates of testing over a family of nested ellipsoids. Nevertheless, we do not 
know if the term ^log(p) is optimal for testing simultaneously against all the ellipsoids £a [R) for 
all sequences (oi) and all i? > 0. When n is smaller than 2p, we obtain comparable results except 
that we are unable to consider alternatives in large dimensions in the infimum l(25|) . 



We now turn to define a procedure of the form ([8]) in order to test simultaneously that 9 — Q 
against 9 belongs to any of the £'s{R)- For this, we introduce the following collection of models M 
and weights am' 
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If n < 2p, we take the set M. to be Urrifc where k belongs to {2^ ,j > 0} n {1, . . . , [n/2]} and 
all the weights am are chosen to be a/jA^j. 

If n > 2p, we take the set JVl to be Um^ where k belongs to ({2-', j > O} n {1, . . . U {p}, 
am equals a/2 and for any k in the model between 1 and p — 1, am^ is chosen to be 
a/(2(|A^|-l)). 



Proposition 12. Let us assume that 









n>L 


1 + log 


{as) 







(26) 



and that > \/log \ogn/n. For any s > 0, the test procedure Ta defined by with Procedure Pi 
and with a class of models given just above satisfies: 

Fo (T„ > 0) > 1 - a, 

and Pe (T„ > 0) > 1 - (5 for any 9 e £'^{R) such that 



var(y) - ||6l|12 
if n < 2p or 



>L{a,S) 



Vlog log n 



4s/(l+4s) 



2s , log log n 



var(y) - 116*112 



>L{a,5) 



Vloglogp 



4s/(l+4s) 



log log p 



(27) 



(28) 



if n> 2p. 

Again, we retrieve similar results to those of Corollary 2 in lBaraud et al. 1 2003l l in the fixed de- 
sign regression framework. For s > 1/4 and n < 2p, the rate of testing is of order f v Qg^og" j 
We show in the next subsection that the logarithmic factor is due to the adaptive property of the test. 

/ n 1 \4s/(l+4s) / s 

If s < 1/4, the rate is of order n'"^' . When n > 2p, the rate is of order / viogjogp \ ^ / ^^ \ ^ 

and we mention at the end of the next subsection that it is optimal. 

Here again, it is possible to define these tests with Procedure P2 in order to improve the power 
of the test (see Section [6] for numerical results). 



5.2 Minimax lower bounds 

We first establish the (a, (5)-minimax rate of testing over an ellipsoid when the variance of Y and 
the covariance matrix of X are known. 



Proposition 13. Let us set the sequence (ai)i<i<p+i and the positive number R. We introduce 

2 . 2 d2 



pl,n{^)-= sup [p^„Aa-i?^], 

l<i<p 

where pf „ is defined by I116\} . then for any non singular covariance matrix E we have 

>pi.rm\)>s, 



(29) 



var(y) - He'll' 
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where the quantity var(Y) is fixed. If a + 6 < 47% then 

pIJR) > sup 



l<i<p 



This lower bound is once more analogous to the one in the fixed design regression framework. 
Contrary to the lower bounds obtained in the previous section, it does not depend on the covariance 
of the covariates. We now look for an upper bound of the minimax rate of testing over a given 
ellipsoid. First, we need to define the quantity D* as: 

D* :=inf|l<z<p,a2i?2<:^| 
with the convention that inf = p. 

Proposition 14. Let us assume that n > L\og [l + log (^)] ■ If > and D* < n/2, the test 
4>mr,',a defined by satisfies 

Po [0m^. ,a = 1] < a and Fg [(j)^^, ,a = 0] < J 

for all 9 e £a{R) such that 



— ^-^ > L(a, 5) sup 

var(r) - iie*!!-^ i<i<p 



A ojR 



If n > 2D*, the rates of testing on an ellips oid are analogo us to the rates on an ellipsoid in fixed 
design regression framework (see for instance BaraudI ( 2002f )). If D* is large and n is small, the 
bounds in Proposition [13] and [14] do not coincide. In this case, we do not know if this comes from 
the fact that the test in Proposition [14] does not depend on the knowledge of var(y) or if one of 
the bounds in Proposition [13] and [H] is not sharp. 

We are now interested in computing lower bounds for rates of testing simultaneously over a 
family of ellipsoids, in order to compare them with rates obtained in Section ISTTl First, we need a 
lower bound for the minimax simultaneous rate of testing over nested linear spaces. We recall that 
for any D £ {1, ■ • ■ Smo stands for the linear spaces of vectors 9 such that only their D first 
coordinates are possibly non-zero. 



Proposition 15. For D > 2, let us set 



-2 

PD,n 



L{a, 6) 

Then, the following lower bound holds 



(30) 



/^^ f u 



r{Y)-\\9p 



>S, 



if for all D between 1 and p, rjj < pD,n 
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Using this Proposition, it is possible to get a lower bound for the simultaneous rate of testing 
over a family of nested ellipsoids. 

Proposition 16. We fix a sequence (ai)i<i<p+i. For each R > 0, let us set 

pl,R,n - sup [pI^^ a {R'al)] . (31) 

l<D<p 

where po.n is given by fgOj) . Then, for any non singular covariance matrix E of the vector X , 



^-(U|^^^-(^) War(;)-H^ ^Mj-^- 

This Proposition shows that the problem of adaptation is impossible in this setting: it is impos- 
sible to define a test which is simultaneously mi nimax ove r a cla ss of nested ellipsoids (for R > 0). 
This is also the case in fixed design as proved bv lSpokolny ( 1996h for the case of Besov bodies. The 
loss of a term of the order y/log \ogp/n is unavoidable. 

As a special case of Proposition [TH it is possible to compute a lower bound for the simultaneous 
minimax rate over £i{R) where R describes the positive numbers. After some calculation, we find 
a lower bound of order: 



/ yioglogp N 1+''^ A log logp 
V n J ' ^ n 



This shows that the power of the test Tq obtained in l(28|) for n > 2p is optimal when i?^ > 
Vlog log n/n. However, when n < 2p and s < 1/4, we do not know if the rate n"^'' is optimal or 
not. 

To conclude, when n > 2p the test Ta defined in Proposition [12] achieves the simultaneous 
minimax rate over the classes of eUipsoids ^'^{R). On the other hand, the test Ta defined in 
Proposition [iT] is not rate optimal simultaneously over all the ellipsoids £a (R) and suffers a loss of 
a ^/Togp factor even when n > 2p. 



6 Simulations studies 

The purpose of this simulation study is threefold. First, we illustrate the theoretical results estab- 
lished in previous sections. Second, we show that our procedure is easy to implement for different 
choices of collections A4 and is computationally feasible even when p is large. Our third purpose 
is to compare the efficiency of Procedures Pi and P2. Indeed, for a given collection M, we know 
from Section 13.21 that the test ^ based on Procedure P2 is more powerful than the corresponding 
test based on Pi. However, the computation of the quantity gx.a is possibly time consuming and 
we therefore want to know if the benefit in power is worth the computational burden. 

To our knowledge, when the number of covariates p is larger than the number of observations n 
there is no test with which we can compare our procedure. 

6.1 Simulation experiments 

We consider the regression model H]) with X = {1, . . . ,p} and test the null hypothesis "9 = 0", 
which is equivalent to "Y is independent of X", at level a = 5%. Let {Xi)i^i^p be a collection of p 
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Gaussian variables with unit variance. The random variable is defined as follows: Y = J2i=i Oi^i'^^ 
where e is a zero mean gaussian variable with variance 1 — 1 16*11^ independent of X . 
We consider two simulation experiments described below. 

1. First simulation experiment: The correlation between Xi and Xj is a constant c for any j. 
Besides, in this experiment the parameter is chosen such that only one of its components 
is possibly non-zero. This corresponds to the situation considered in Section IH First, the 
number of covariates p is fixed equal to 30 and the number of observations n is taken equal 
to 10 and 15. We choose for c three different values 0, 0.1, and 0.8, allowing thus to compare 
the procedure for independent, weakly and highly correlated covariates. We estimate the size 
of the test by taking Oi = Q and the power by taking for 9i the values 0.8 and 0.9. Theses 
choices of 9 lead to a small and a large signal/noise ratio Ts/n defined in ^ and equal in this 
experiment to 6*^/(1 — 6\). Second, we examine the behavior of the tests when p increases 
and when the covariates are highly correlated: p equals 100 and 500, n equals 10 and 15, 6i 
is set to and 0.8, and c is chosen to be 0.8. 

2. Second simulation experiment: The covariates (Xi)i^i<jp are independent. The number of 
covariates p equals 500 and the number of observations n equals 50 and 100. We set for any 
I e {1, . . . 9i = Ri~^ . We estimate the size of the test by taking R — Q and the power by 
taking for (i?, s) the value (0.2, 0.5), which corresponds to a slow decrease of the (0i)i^i^p. It 
was pointed out in the beginning of Section [5] that equals var(y|Xm. J — var(F|X,„.). 
Thus, l^ip represents the benefit in term of conditional variance brought by the variable Xi. 

We use our testing procedure defined in |[8]) with different collections M and different choices 
for the weights {am,iTi & M}. 



The collections M: we define three classes. Let us set Jn,p = pA [^], where [x] denotes the 
integer part of x and let us define: 



We evaluate the performance of our testing procedure with M — in the first simulation exper- 
iment, and M — and in the second simulation experiment. The cardinality of these three 
collections is smaller than p, and the computational complexity of the testing procedures is at most 
linear in p. 

The collections {am,m G M}: We consider Procedures Pi and P2 defined in Section [H When 
we are using the procedure Pi, the am's equal a/jA^I where \M\ denotes the cardinality of the 
collection M . The quantity gx,a that occurs in the procedure P2 is computed by simulation. We 
use 1000 simulations for the estimation of (?x,q- In the sequel we note T^i^p. the test ^ with 
collection A^' and Procedure Pj. 

In the first experiment, when p is large we also consider two other tests: 

1. The test <^{i},q Q of the hypothesis 6'i = against the alternative 9i ^ 0. This test 
corresponds to the single test when we know which coordinate is non-zero. 



{nik = {1, 2, . . . , fc}, 1 < fc < Jn,p}} 

{mk - {1, 2, . . . , fc}, /c e {2^ J > 0} n {1, . . . , J„,p}} 



M 



3 
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2. The test (p^p+i^^a where Xp^i := | X^iLi -^i- Adapting the proof of Proposition O we know 
that this test is approximately minimax on 8[l,p] if the correlation between the covariates is 
constant and large. 

Contrary to our procedures, these two tests are based on the knowledge of var(X) (and eventually 
9). We only use them as a benchmark to evaluate the performance of our procedure. We aim at 
showing that our test with Procedure P2 is as powerful than 0{p+i} ,^ and is close to the test (t>{i}.a. 

We estimate the size and the power of the testing procedures with 1000 simulations. For each 
simulation, we simulate the gaussian vector {Xi,...,Xp) and then simulate the variable Y as 
described in the two simulation experiments. 

6.2 Results of the simulation 



Null hypothesis is true, 6*1 = 



n 






10 


0.043 


0.045 


15 


0.044 


0.049 



Null hypothesis is false 



ei 


= 0.8, r,/„ = 1.78 


ei 


= 0.9, r,/„ = 4.26 


n 


Tm\Pi Tj^ip^ 


n 


Tmkpi Tj^ip^ 


10 


0.48 0.48 


10 


0.86 0.86 


15 


0.81 0.81 


15 


0.99 0.99 



Table 1: First simulation study, independent case: p = 30, c = 0. Percentages of rejection and 
value of the signal/noise ratio Vs/n- 

The results of the first simulation experiment for c = are given in Table [H As expected, the 
power of the tests increases with the number of observations n and with the signal/noise ratio rs/n- 
If the signal/noise ratio is large enough, we obtain powerful tests even if the number of covariates 
p is larger than the number of observations. 

In Table [2] we present results of the first simulation experiment for 9i = 0.8 when c varies. 
Let us first compare the results for independent, weakly and highly correlated covariates when using 
Procedure Pi . The size and the power of the test for weakly correlated covariates are similar to 
the size and the power obtained in the independent case. Hence, we recover the remark following 
Proposition [8l when the correlation coefficient between the covariates is small, the minimax rate 
is of the same order as in the independent case. The test for highly correlated covariates is more 
powerful than the test for independent covariates, recovering thus the remark following Theorem [71 
the worst case from a minimax rate perspective is the case where the covariates are independent. Let 
us now compare Procedures Pi and P2- In the case of independent or weakly correlated covariates, 
they give similar results. For highly correlated covariates, the power of T^ii p, is much larger than 
the one of Tj^i p^. 

In Table [3] we present results of the multiple testing procedure and of the two tests 4'{i},a and 
when c = 0.8 and the number of covariates p is large. For p = 500 and n = 15, one test 
takes less than one second with Procedure Pi and less than 30 seconds with Procedure P2- As 
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Null hypothesis is true, = 



c = c = 0.1 



n 


Tm\Pi 






n 


Tm\Pi 


Tm\P2 


10 


0.043 


0.045 




10 


0.042 


0.04 


15 


0.044 
c = 0.8 


0.049 




15 


0.058 


0.06 


n 




Tm^,P2 










10 


0.018 


0.045 










15 


0.019 
c = 


0.052 


Null hypothesis is false, 6i = 0.8 




c = 0.1 




n 




Tm^,P2 




n 


^Mi,Pi 


Tm^,P2 


10 


0.48 


0.48 




10 


0.49 


0.49 


15 


0.81 

c = O.g 


0.81 




15 


0.81 


0.82 


n 


^A^i.Pi 


Tm\P2 










10 


0.64 


0.77 










15 


0.89 


0.94 











Table 2: First simulation study, independent and dependent case, p = 30 Frequencies of rejection. 



Null hypothesis is true, = 



p = 100 p = 500 



n 


TmKPi 


Tm\P2 < 




4>{p+l},a 


n 




TmKPi 






<*'{p+l},a 


10 


0.01 


0.056 


0.051 


0.045 


10 




0.009 


0.044 


0.040 


0.040 


15 


0.016 


0.053 


0.047 


0.053 


15 




0.011 


0.040 


0.042 


0.034 








Null hypothesis is false. 


9i = 0. 


8 












p = 100 












p = 500 






n 


'^M\Pi 


Tm^,P2 < 


P{l},a 


0{p+l},Q 


n 




'^M\Pi 


Tm^,P2 < 


^{l},a 




10 


0.60 


0.77 


0.91 


0.79 


10 




0.52 


0.76 


0.91 


0.77 


15 


0.85 


0.92 


0.99 


0.92 


15 




0.77 


0.94 


0.99 


0.94 



Table 3: First simulation study, dependent case: c = 0.8. Frequencies of rejection. 
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expected, Procedure Pi is too conservative when p increases. For p = 100, the power of the test 
based on Procedure Pi is smaller than the power of the test 4'{p+i}^a and this difference increases 
when p is larger. The test based on Procedure P2 is as powerful as <t>{p+i},a^ and its power is close to 
the one of (^{ij^^. We recall that this last test is based on the knowledge of the non-zero component 
of 9 contrary to ours. Besides, the test ^{p+i},a was shown in Proposition [9] to be optimal for 
this particular correlation setting. Hence, Procedure P2 seems to achieve the optimal rate in this 
situation. Thus, we advise to use in practice Procedure P2 if the number of covariates p is large, 
because Procedure Pi becomes too conservative, especially if the covariates are correlated. 



Null hypothesis is true, i? = 



n 




Tm'^,P2 


Tm'^,Pi 


Tm^,P2 


50 


0.013 


0.052 


0.036 


0.059 


100 


0.009 


0.059 


0.042 


0.059 


Null hypothesis is false, 


i? = 0.2. 


s = 0.5 


n 




Tm2,P2 


Tm\Pi 


Tm^,P2 


50 


0.17 


0.33 


0.31 


0.38 


100 


0.42 


0.66 


0.62 


0.69 



Table 4: Second simulation study. Frequencies of rejection. 



The results of the second simulation experiment are given in Table HI As expected. Procedure 
P2 improves the power of the test and the test Tj^i p^ has the greatest power. In this setting, 
one should prefer the collection to M.^ . This was previously pointed out in Section [5] from a 
theoretical point of view. Although Tj^z p^ is conservative, it is a good compromise for practical 
issues: it is very easy and fast to implement and its performances are good. 



7 Proofs of Theorem [3], Propositions [5], IE m [H, and [T4 



Proof of Theorem\^ In a nutshell, we shall prove that conditionally to the design X the distribu- 
tion of the test Ta is the same as the test introduced by Baraud et al. Baraud et al. ( 20031 ). Hence, 
we may apply their non asymptotic upper bound for the power. 



Distribution of 0m(Y, X) . First, we derive the distribution of the test statistic (/)m(Y,X) 
under P^. The distribution of Y conditionally to the set of variables {Xvvjm) is of the form 

y= ^ 6lfU"X, + e^^™, (32) 

where the vector 9^^"^ is constant and e^'-^™ is a zero mean Gaussian variable independent oiXvum, 
whose variance is \ai{Y\Xv\jm)- As a consequence, |jY — IlyumYII^ is exactly ||n(yum)^e^'~^'"llra) 
where '^{vum)^ denotes the orthogonal projection along the space generated by (Xi)igyum. 
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Using the same decomposition of Y one simplifies the numerator of ^m(Y,X): 



in 



VUm 



Y-nv.Y|| 



nVUm 



(X, - HyX,) 



TT ^VUm 



where ^v^n{vum) is the orthogonal projection onto the intersection between the space generated 

by (Xi)igyum and the orthogonal of the space generated by (Xi)igy. 

For any i G m, let us consider the conditional distribution of Xi with respect to Xy, 



(33) 



where Oj'^ are constants and is a zero-mean normal gaussian random variable whose variance is 
var{Xi\Xv) and which is independent of Xy. This enables us to express 

X,; - IlyXi = Tly±r){VUm)<^Y y ^^T all z G m . 

Therefore, we decompose 0„j(Y, X) in 



.(Y,X) 



Dni\\^{VUm)^^ 



VUm\\2 



(34) 



Let us define the random variable Zm"* and where Zm"^ refers to the numerator of II34D divided 



.(1) 



by Nm and zjn' to the denominator divided by Dm- We now prove that Zm'' and Zm'' are inde- 
pendent. 

The variables {ej)j^m are cr (Xv'um)-measurable as linear combinations of elements in 'X.vum- 
Moreover, e^'~'™ follows a zero mean normal distribution with covariance matrix vai{Y\Xvum)In 
and is independent of Xyum- As a consequence, conditionally to Xyum, Zm^ and Zm^ are inde- 
pendent by Cochran's Theorem as they correspond to projections onto two sets orthogonal from 
each other. 

As ej is a Hnear combination of the columns of Xyum, Zm^ follows a non-central distribution 
conditionally to 'X-vum- 



(Z,«|Xyum) - var(F|Xyum)x' 



jem J 



VUniTT _V 



'-,Dr, 



We denote a^(Xyum) 



var{Y\Xvu,i 



var(y|Xvum) 
this non-centrality parameter. 



Power of Ta conditionally to Xyum- C onditionally to Xyum 

l|2003l l 



is the same as that proposed by Baraud et al iBaraud et al 

var(F|Xyum)- Arguing as in their proof of Theorem 1, there exists some quantity A 



our test statistic (/)m(Y.X) 
with n — d data and = 
(S) such that 
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the procedure accepts the hypothesis with probabiUty not larger than (5/2 if a^(Xyum) > ^mi6): 



2.5[fc,„if„(C/) V5] log 



1 



N„ 

2Dr, 



(35) 



where Um := log(l/a„), U := log(2/(5), fc™ := 2exp{4:Ura/Nm), and 



Consequently, we have 



KM :=1 + 2J— + 2^„. 



(T, < 0|Xyum) 1 {a^(Xyum) > Am (^) } < -5/2. 



(36) 



Let derive the distribution of the non-central parameter am(Xyum)- First, we simplify the 
projection term as ej is a linear combinations of elements of 'Kvum- 



Let us define 



)VUm V 



™ var(F|Xyu 

As the variable X^jem independent of Xy, and as almost surely the dimension of the 

vector space generated by Xy is d, we get 



\ai{Y\Xvum) 

Hence, applying for instance Lemma 1 in lLaurent and Massart ( 2000l ). we get 

a^(Xyum) 



> (n ~d)- 2y/{n~d)U 



< 6/2. 



Let gather (|36| with this last bound. If 



A™ (5) 



(37) 



then it holds that 

MTa < 0) < Vg (T, < 0, aliXvum) > KniS)) + ^9 K(Xyu™) < A„(J)] 
< Eg {Fe [r„ < 0, a^(Xyum) > A„^{S)\Xvum] } + 



> {n - d) - 2yJ [n ~ d)U 



< 6. 
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Computation of k^. Let us now compute the quantity in order to simplify Condition 
(|37| . Let first express var{Y\Xv) in terms of var(Y\Xmuv) using the decomposition l|32p of Y. 

YaviY\Xv) = var I ^ ^^j^^^Xj + e^^" |Xy 

= varj ^ ej'u^XjlXv I +var(e^^™|Xy) 

= varl J2 Oj'^"'X,\Xv]+Yai(Y\Xvura), (38) 
as e^*-^™ is independent of Xyum- Now using the definition of ej in l|33p . it turns out that 
var| Oj''"'Xj\Xv\ = var [ ^ ej'u^X.lX, 

= --r(j:0j--eJ\X, 



var 



E ^."""S" ' (39) 



as the (ej')jem are independent of Xy- Gathering formulae l(38|) and (|39| . we get 

2 var(y|Xy) - var(F|Xyum) 

'^m = TTTn? ^ • (^0) 



Under Assumption t^m < for all m € and U < Nm/21. Hence, the terms U/Nm, 

Um/Nm, km, and Km{U) behave like constants and it follows from l|37p that A'(m) < A(m), which 
concludes the proof. □ 

Proof of Proposition ^^ We first re call the classical upper bound for the binomial coefficient (see 
for instance (2.9) in lMassartI ()2007t ll. 



log|A^(fc,p)|=log(P)<A:log(|^). 



As a consequence, log(l/am) < log(l/a) + fclog(^). Assumption (fM)) with L — 21 therefore 
implies Hypothesis Hm- Hence, we are in position to apply the second result of Theorem [31 
Moreover, the assumption on n implies that n > 21k and Dm/Nm is thus smaller than 1/20 for any 
model m in M{k,p). Formula l|12p in Theorem [3] then translates into 



A(m) < 



k"^ log 



ep > 
V k I 



k log 



^_2_ 

^ cub 



I.IL, (fclog(f)+log(^)) 
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and it follows that Proposition [5] holds. 

□ 

Proof of Proposition[B We fix the constant L in Hypothesis (|20|) to be 2f log(4e) V C2 log(4) where 
the universal constant C2 is defined later in the proof. This choice of constants allows the procedure 
[sup]^<j<;p 0{i},Q/(2p)] to satisfy Hypothesis Hm- An argument similar to the proof of Proposition 
[5] allows to show easily that there exists a universal constant C such that if we set 

,f := £02IM±MM.£,„JM, ,41) 



then var(yj-||6'|p — Pi ™plies that Vg {Ta > 0) > 1 — S. Here, the factor 4 in the logarithm comes 
from the fact that some weights a™ equal a/{2p). 

Let and be two positive numbers such that var(y)-A^ ~ ^^"^ ^ ^ ©[liP] such 
that II^IP = A^. As corv{Xi, Xj) — c for any i 7^ j, it follows that var(Xp-|_i) = c + and 



C0v(y,Xp+i)2 = 



1-e 



var(y) - var(y _ (c + (1 - c)/p) A^ 



var(r|Xp+i) var(r) - (c + (1 - c)/p) A^ ' 

We now apply Theorem[3]to (l){p+i}^a/2 under Hji4. There exists a universal constant C2 such that 

(0{p+i},a/2 > 0) > 1 - ,5 if 

(c+(l-c)/p) A^ ^ 
var(r ) - (c + (1 - c)/p) A2 - n ^ V 

This last condition is implied by 

cA2 C2 , / 4 

> — log ^ 



var(y ) — cA^ n \ a(S 
which is equivalent to 

A' C2 f±\ 

var(r)-cn + cC2log(^)^°H«'5j- ^^^^ 

Let us assume that c > log (^) / log As n > 2C2 log (|^) (Hypothesis (|20l) and definition of 

L), nc > 2C2 log (t^). As a consequence, Condition l(42|) is implied by: 

p^>^logf±V (43) 



nc 

Combining gl]) and 103]) allows to conclude that Pe {Ta > 0) > 1 - 5 if 



□ 
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Proof of Proposition\ll[ We fix the constant L to 421og(80) in Hypothesis ((23l) . It follows that 
(|23)) impHes 



7i>42(log( VlogQ 



(44) 



First, we check that the test Ta satisfies Condition Hm- As the dimension of each model is smaller 
than n/2, for any model m in M, Nm is larger than n/2. Moreover, for any model m in M, am is 
larger than q;/(2|A^|) and \M\ is smaller than n/2. As a consequence, the first condition oi Hm is 
implied by the inequality 



71 > 20 



log(^ 



(45) 



Hypothesis (|44|) implies that n/2 > 20 log (^). Besides, for any n > it holds that n/2 > 
20 log (^). Combining these two lower bounds enables to obtain (|45| . The second condition of 
Hm holds if n > 42 log (|) which is a consequence of hypothesis (|44l) . 

Let first consider the case n < 2p and let apply Theorem[3]under Hypothesis Hm to Ta. Pe {Ta > 0) > 
1 - J for all OgRp such that 



, var(y)-var(y|X„J 
3z e {1, . . . , [n/2J}, ^TTT^ — ^ > C 



^log(^)+log(M) 



var(F|X„iJ n 
where C is an universal constant. Let 6 be an element oi £a{R) that satisfies 



ll^f > (1 + C^) (var(y|X„J - var(r|X)) + (1 + C)var(y|X) 
for some 1 < i < [n/2]. By Hypothesis l(23|l . it holds that 



zlog(^)+log(^ 



zlog(^)+log(^ 



< 1 



(46) 



for any i between 1 and [n/2]. It is then straightforward to check that satisfies |[46| 
As 6 belongs to the set £a{R), 



a2^ivar(r|X) 



rvw ^ 2 var(y|X™-_J-var(r|X„.) 

var(F|X,„J - var(y|X) = a^_^_ivai(Y\X) — 

< a?+ivar(r|X)i?2. 
Hence, if 9 belongs to Sa{R) and satisfies 



> (1 + C)var(y|X) 



o-i+iR 







+ - log 


(-) 


/ n 


\a6J 
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then fg{Ta < 0) < 5. Gathering this condition for any i between 1 and [n/2] allows to conclude 
that if 9 satisfies 



var(r) - ||0||2 
then P(,(Ta < 0) < 5. 



>{l + C) 



inf 

l<i<[Ti/2] 




Let us now turn to the case n > 2p. Let us consider Tq as the supremum of p — 1 tests of 
level a/2{p — 1) and one test of level a/2. By considering the p — 1 firsts tests, we obtain as in the 
previous case that Fe{Ta < 0) < S if 



>{1 + C) 



inf 

l<i<(j9-l) 



var(y)- II^IP 
On the other hand, using the last test statistic 0i,q/2 

ll^ll' 




a < 0) < (5 if 



> C 



var(r) - 116*112 
Gathering these two conditions allows to prove (|25l) . 



□ 



Proof of Proposition QJl The approach behind this proof is similar to the one for Proposition [TTJ 
We fix the constant L in Assumption [261 as in the previous proof. Hence, the collection of models 
M. and the weights a„i satisfy hypothesis Hm as in the previous proof. 



Let us give a sharper upper bound on 

\M\ < l + log(n/2Ap)/log(2) < log(n A 2p)/ log(2). 



(47) 



We deduce from (|47| that there exists a constant L{a, 5) only depending on a and 5 such that for 
all m e A^, 

t) < L(a, (5) log log(n a p). 

amo J 



log 



First, let us consider the case n < 2p. We apply Theorem [3] under Assumption Hm- As in the 
proof of Proposition [m we obtain that Pe{Ta > 0) > 1 — 5 if 



ll^ll' 



KY)-\\0\\' 



inf 

ie{2i,i>0}n{l,...,[n/2]} 



^2^, j ^^-2^ j ^/^loglog T^^^ , loglogn 



It is worth noting that i?2j-2s < V»iogiogn ^^^^ 

?2,, \ 2/(l+4s) 



I > I 



i/log log n 
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Under the assumption on R, i* is larger than one. Let us distinguish between two cases. If there 
exists i' in {2^ ,j > 0} n {1, . . . , [f^/2]} such that i* < i', one can take i' < 2i* and then 



inf 

iG{2J j>0}n{l,...,[«/2]} 



j,2-2s , V» log log » 

Hi H 



< 2 



^/i' log log n 



/ A i \ 4s/(l+4s) 

< 2V2i?^/(i+4^) (^liiHl^) .(48) 



Else, we take i' e {2^,j > 0} n {1, . . . , [n/2]} such that n/4 < i' < n/2. Since z' < {i* A n/2) we 
obtain that 



inf 

i<£{2i j>0}n{l,...,[n/2]} 



2 --25 



y/i log log n 



< 2i?2i'-2s < 2i?M - 



Gathering inequalities l(48|) and (|49|) allows to prove l(27|) . 



(49) 



We now turn to the case n > 2p. As in the proof of Proposition [TTl we divide the proof into 
two parts: first we give an upper bound of the power for the \^A \ — 1 first tests which define Tq, 
and then we give an upper bound for the last test 01,0/2 • Combining these two inequalities allows 
us to prove □ 



Proof of Proposition [141 We fix the constant L in the assumption as in the two previous proofs. 
We first note that the assumption on R^ implies that D* > 2. As N„i is larger than n/2, the 
4>mj:,, test clearly satisfies Condition . As a consequence, we may apply Theorem [31 Hence, 
^eiT* < 0) < (5 for any 9 such that 



var(y)-var(y|X_)^^^^^^^V^ 



(50) 



Now, we use the same sketch as in the proof of Proposition [TTJ For any 9 g Ea{R), Condition ([50] 
is equivalent to: 



ll^ll' > (var(y|X„,„.)-var(F|X)) \ l + L{a,5)- 



-Yav{Y\X)L{a,5)- 



'D* 



(51) 



Moreover, as 6 belongs to Ea{R), 

var(y|X™„.) - var(y|X) < a^.+ii?2var(r|X) < a|,. var(r|X)i?2. 
As \J15* jn is smaller than one, Condition (|5T|) is impHed by 



1^11^ 



> (1 + L(a,(5)) al.R'^ + 



'D* 



var(y) - 116*112 

As ajj^R"^ is smaller than which is smaller supi<j<p 
0) < (5 for any 9 belonging to Ea{R) such that 

' 112 >2{l + L{a,5)) sup 
var(y) - Ijt'll^ i<i<p 



n 

4 A a?i?2 



— A alR^ 



it turns out that PeiT* 



□ 
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8 Proofs of Theorem [7], Propositions [1, [6], [8], [TO], [13], [15], and 

m 



Throughout this section, we shall use the notations ?7 :— 2(1 — a — 5) and £(r/) := 



log(l+2i)^ 



Proof of Theorem Thi s proof f ollows the general method for obtaining lower bounds described 
in Section 7.1 in Baraud iBaraudI l|2002l l. We first remind the reader of the main arguments of the 
approach applied to our model. Let p be some positive number and /ip be some probability measure 
on 



\\o\\ 



var(y) - ll^ll 



= P 



We define 



= ^'VeAp,p{9) and the set of level-a tests of the hypothesis "6* = 0". Then, 

|P,,^(A)-Po(A)| 



(3i{Q[k,p,p]) > ^mf^P^^[(/.„ =0] 



> 1 - a 



> 1 - a 



sup 

A, Po(A)<a 

i||P^^-Po|jTy, 



(52) 



where 



^oIItv denotes the total variation norm between the probabilities P^^ and Pq. If we 
suppose that P^^ is absolutely continuous with respect to Pq, we can upper bound the norm in total 
variation between these two probabilities as follows. We define 



dP 



dPo 



(Y,X) 



Then, we get the upper bound 



■ Mp 



'0\\TV — 



< 



J |L^,(Y,X)-l|dPo(Y,X) 



iL(Y,X) 



- 1 



1/2 



Thus, we deduce from ((52|) that 

Pi{e[k,p,p]) > l-a - i (Eo 
If we find a number p* — p* {q) such that 



1/2 



then for any p < p* , 



log(Eo [i2^.(Y,X)]) </:(r?). 



Pi{e[k,p,p])>l~a-j = S. 



(53) 



To apply this method, we first have to define a suitable prior p,p on Q[k,p,p]. Let fh be some 
random variable uniformly distributed over M{k,p) and for each m e M{k,p), let e™ = (e™)jgm 
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be a sequence of independent Rademacher random variables. We assume that for all m G A4{k,p), 
e™ and m are independent. Let p be given and /ip be the distribution of the random variable 
Ejem-^efej where 

■ fc(l + p2)' 

and where {ej)j^x is the orthonormal family of vectors of W defined by 

{ej)i = liii — j and {ei)j — otherwise. 

Straightforwardly, is supported by 9[fc,p, p]. For any m in M{k,p) and any vector (Cj")iem 
with values in {—1; 1}, let ^m,C",p be the Dirac measure on J2jem. ^CP^j- ^^''^ ^i^Y in M.{k,p), 
Pm.p denotes the distribution of the random variable J^jem. ^^T^i where (C™) is a sequence of 
independent Rademacher random variables. These definitions easily imply 

We aim at bounding the quantity Eo(i^^) and obtaining an inequality of the form (|53l) . First, we 
work out Ln ,™ : 



,(Y,X) 



n/2 



1 - 



aVc 



var(y) 



exp 



2 var(y)(var(r) - A2fc) 



< Y, X, >, 



var(r) 



< Xj , Xj, 



>r 



j,j'em 



2(var(r) - A2fc) 



(54) 



where < . >„ refers to the canonical inner product in R". 

Let us fix TOi and TO2 in A^(fc,p) and two vectors and respectively associated to mi and 
m2. We aim at computing the quantity Eg (^L^^^ ^ (Y, X)Lp^ ^ (Y, X)^ . First, we decompose 

the set TOi U m2 into four sets (which possibly are empty): mi \ m2, m2 \ mi, TO3, and TO4, where 
ma and m4 are defined by: 



m4 



:= {ieminm2|Cj =Cj} 

:= {ieTOinm2|Cj = -C|} 



For the sake of simplicity, we reorder the elements of mi U m2 from 1 to |toi U m2| such that 
the first elements belong to toi \ m2, then to m2 \ mi and so on. Moreover, we define the vector 
( e jglmiumsl gypjj ^jjg^^ Q _ jf j g ^jjj Q _ ^2 |f j g m2 \ mi. Using these notations, we 
compute the expectation of _Lj„^ ,^1 p(Y, X)L„2,c2,p(Y, X). 



n/2 



Eo i 



,(Y,X)ip 



,(Y,X) = 



^var(y)(l-^) 



1^1 



-n/2 



(55) 
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where |.| refers to the determinant and ^ is a symmetric square matrix of size |mi U m2| + 1 such 
that: 



var(y)+A'^fc 
var(Y)(var(y)-A2fc) 

var(i')-A2fc 

^var(Y)-A^fc 




if 



if {j — 1) G miATO2 
if (j - 1) e TO3 

if - 1) G 7714 , 



where 777i A7772 refers to (7711 U 7772) \ (7711 n 7772). For any i > 1 and j > 1, A satisfies 



\2 Ci-iCj-i 

var(i')-A^fc 



A[7,j] < 



\2 G-iCj-i I X 

var(y)-A2fc 
_ \2 Ci-iQ-i 
var(y)-A2fc 

var(y)-A2fc + f i j 

else, 



if 
if 
if 
if 



{i — 1, j — 1) G (m,! \ 7^2) X TTll 

(7 — 1, J — 1) G (70,2 \ mi) X (70,2 \ 7771 U 7)7.3) 

(7 — 1, J — 1) G (7772 \ 777l) X 7774 

(i — 1, J — 1) G [7773 X 7773] U [7774 X 7774] 



where <5ij is the indicator function of 7 = j. 

After some linear transformation on the lines of the matrix A, it is possible to express its 
determinant into 

var(r) + A^fc 



\A\ 



var(y)(var(r) - A^fc) 



|-^|miUm2| + ^\ I 



where I\ 



\m1Um2 



I is the identity matrix of size |777i U 7772I. C is a symmetric matrix of size |777i U 7772] 



such that for any (7, j), 
and I? is a block symmetric matrix defined by 



D := 



A*fe 


-A^var(y) 


-A^ 


A^ 1 


var^(y)-A4fc^ 
-A^var(y) 


var^ti'j-A'ife^ 

\*k 


var(y)+A2fc 

-A^ 


var(Y)-A^/c 

-A2 


var2(y)--A4fc2 

-A^ 


var2(i')-A4fe2 

-A^ 


var(y)+A2fc 

-2A^ 


var(Y)-A2/c 


2A^ 


var(y)+A^fc 

A^ 


var(y)+A^fc 

-A^ 


var(y)+A^fc 



L var(y)-A^fe 


var(y)-A^fc 


var(Y)-A^fc J 



Each block corresponds to one of the four previously defined subsets of 7771 U 7772 (i.e. 7774 \ 7772, 
7r72\7r7i, 7r73, and 7^4). The matrix D is of rank at most four. By computing its non-zero eigenvalues, 
it is then straightforward to derive the determinant of A 



[var(y) - A2(2|7773| - |777i n 7772I 



var(r)(var(y) - X^k)^ 
Gathering this equality with (|55l) yields 



A2(2|7n3|-|minm2|) 

var(Y) 



(56) 
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Then, we take the expectation with respect to C^, C^, mi and TO2. When mi and TO2 are 
fixed the expression (|56| depends on and only towards the cardinaHty of ms. As and 
correspond to independent Rademacher variables, the random variable 2|m3| — |mi n m2| follows 
the distribution of Z, a sum of |toi fl m2| independent Rademacher variables and 



Eo(i^„^,,(Y,X)L^„^,^(Y,X)) = Eo 



1 



var(y) 



(57) 



When Z is non-positive, this expression is smaller than one. Alternatively, when Z is non- 
negative: 



1 



var(y) J 



— exp n log 



1 



1 - 



\2Z 

var(y) , 



< exp 



< exp 



x-'z 
var(y) 



1 - 



x^z 
var(Y) 

x^z 
var(r) 

X'^k 

var(Y) 



as log(H-x) < X and as Z is smaller than k. We define an event A such that {Z > 0} C A C {Z > 0} 
and P(A) = i. This is always possible as the random variable Z is symmetric. As a consequence, 
on the event A'^, the quantity l(57|) is smaller or equal to one. All in all, we bound (|57|) by: 



Eo(L,„„^,,(Y,X)L^„^,^(Y,X)) < - + Eo 

where 1a is the indicator function of the event A. 
parameter v €]0; 1], which will be fixed later. 



1a exp 



x^z 
var(y) 



1 - 



X^k 



(58) 



var(y) 

We now apply Holder's inequality with a 



En 



1a exp 



x-'z 
var(Y) 



1 - 



X^k 

var(y) 



< 



< 



x^z 



^ , n var(y 
Eo exp - 



VI- 



X^k 

var(y) 



cosh 



v{Yav{Y) - A2fc) 



|minm2 I V 



(59) 



Gathering inequalities l(58|) and ([59l) yields 

i-t) 



^L(Y,X) 



1 

< - 
- 2 



^-TT cosh 

^kl mi,nn2S:M(k,p) 



i;(var(r) - A^fc) 



\ra\r\ra'2\v 



Following the approach of Baraud TBaraudl ()2002l ) in Section 7.2, we note that if toi and m2 are 
taken uniformly and independently in A^(fc,p), then |mi n m2| is distributed as a Hypergeometric 
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distribution with parameters p, k, and k/p. Thus, we derive that 



Eo 



1 

< - 
- 2 



l-v 



cosh 



t;(var(y) - X^k) 



(60) 



where T is a random variable distributed accordin g to a HyperR eometric distribution with param- 
eters p, k and k/p. We know from Aldous (p. 173) Aldouj 1 1985l l that T has the same distribution 
as the random variable E(VF|Sp) where W is binomial random variable of parameters k, k/p and 
Bp some suitable cr-algebra. By a convexity argument, we then upper bound l(60|) . 



Eo 



^L(Y,X) 



< 



1 /I 



l-v 



cosh 



v{var{Y) - X^k) 



1 /I 



1 H — I cosh 
P 



1 /I 



exp 



fcloK 1 



w(var(y) - A2fc) 
k 



- 1 



P 



cosh 



nA2 



i;(var(r) - A2fc) 



1 



To get the upper bound on the total variation distance appearing in l(52|) , we aim at constraining 
this last expression to be smaller than 1 + 77^. This is equivalent to the following inequality: 



2" exp 



We now choose v 



k log ( 1 H — ( cosh 



nX^k 



p \ \vk(Yar{Y) — X^k) 



1 



< 1 + 2ry^ 



(61) 



Cjri) 
log(2) 



A 1. If u is strictly smaller than one, then ([6T|) is equivalent to: 



/clog 



1 H — ( cosh 
P \ 



nX k 



vk{yai{Y) — X^k) 



1 



< 



log(l + 2r?2) 



(62) 



It is straightforward to show that this last inequality also implies ((6T|) if v equals one. We now 
suppose that 



w(var(y) - A2fc) 



< log (^(1 + u)i + ^J{l + u)^ 



(63) 



where u 



fc2 



Using the classical equality cosh [log(l + x + V2x + a?)] — I + x with 



(1 + u) " — 1, we deduce that inequality ((63| implies l(62|) because 



k log 1 H — cosh 
P 



t;fc(var(y) - X'^k) 



- 1 ) ) < fclog 1^1 + -M 

< —u < CM . 
P 



For any (3 > 1 and any a; > 0, it holds that (1 + x)^ > 1 + (3x. As ^ > 1, Condition (|63| is implied 
by: 

— — - < —log 1 + - + W_ . 

var(yj — A^fc n \ v y v I 
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One then combines the previous inequahty with the definitions of u and v to obtain the upper 
bound 



X^k ^ k rCirjl i\ L ^ P(log(2) V £(77)) , , /2p(log(2) V C{v)) 



var(r) - A2fc - n Vlog(2) J \ k^ V P / ' 

For any x positive and any u between and 1, log(l + ux) > ulog(l + x). As a consequence, the 
previous inequahty is impHed by: 

A^fc < ^('^AlV[/:(,?)Vlog(2)]Al)logfl + ii + 



var(y)-A2fc - n Vlog(2) J^' ^'^ ^ ^ I ' p 



To resume, if we take smaller than ifTT)) . then 

Pi{e[k,p,p])>s . 

Besides, the lower bound is strict if is strictly smaller than l|17p . To prove the second part of the 
theorem, one has to observe that a + S < 53% implies that £(77) > ^. □ 

Proof of Proposition Let us first assume that the covariance matrix of X is the identity. We 
argue as in the proof of Theorem [7] taking k = p. The sketch of the proof remains unchanged except 
that we slightly modify the last part. Inequality (|62|) becomes 

where we recall that v = A 1. For all x € M, cosh(x) < exp(a;^/2). Consequently, the previous 
inequality is implied by 

< 72^-^ 



var(F) — X'^p n ' 

and the result follows easily. 

If we no longer assume that the covariance matrix S is the identity, we orthogonalize the sequence 
Xi thanks to Gram-Schmidt process. Applying the previous argument to this new sequence of 
covariates allows to conclude. □ 

Proof of Proposition\^ Let define the constant L{a, S) involved in the condition: 



i(a, 6) := 0og(l + 8(l-a-(5)2) 1 A ^log (1 + 8(1 - a - S)^) /(21og2) 
Let us apply proposition El For any p < L{a,S)^^^- and any <^ > there exists some 6 € 5m 

such tll3,t YQiX ( y ) — 1 1 j [ ^ — ^ Qjlld ^oi^^m^a 

< 0) > S — <;. In the proof of Theorem [31 we have shown 
in ((34|) and following equalities that the distribution of the test statistic only depends on the 



RR n° 6354 



38 



Verzelen & Villers 



quantity = ^y^^. ( y^^^ J j • Let 9' be an element of Sm such that = ■ The distribution 
of 0m under Pg' is the same as its distribution under Pg, and therefore 

Ps' < 0) > ,5 - ^. 

Letting go to enables to conclude. 

□ 

Proof of Proposition This lower bound for dependent gaussian covariates is proved through the 
same approach as Theorem [71 We define the measure Hp as in that proof. Under the hypothesis i?o, 
Y is independent of X . We note S the covariance matrix of X and Eo,s stands for the distribution 
of (Y,X) under in order to emphasize the dependence on S. 

First, one has to upper bound the quantity Eo,s -^^p(Y,X) . For the sake of simplicity, we 
make the hypothesis that every covariate Xj has variance 1. If this is not the case, we only have 
to rescale these variables. The quantity corr(z, j) refers to the correlation between Xi and Xj. As 
we only consider the case A: = 1, the set of models m in A^(l,p) is in correspondence with the set 



Eo sfiu 1 (Y,X)L„ , (Y,X)') = f — ^-^^^X-^TTTTT^\ ■ 

When i and j are fixed, we upper bound the expectation of this quantity with respect to and 
by 

Eo,E {L,^JY,X)L,^JY,X)) < 1 + 1 ( ,^/7^^V (64) 

2 2 \var(y) — |corr(i, j)|A^/ 

If i ^ j, |corr(i, j)| is smaller than c and if i = j, corr(i, j) is exactly one. As a consequence, taking 
the expectation of l(64|) with respect to i and j yields the upper bound 



Recall that we want to constrain this quantity l(65|) to be smaller than l + yy^. In particular, this 
holds if the two following inequalities hold: 

1 / var(y) \" 1 2 

' ^ ' ' < - + (66) 



p \ var(F) — / p 

< + ■ (67) 



p-l ( var(y ) \ " P - 1 , 2 



p \var(F) — cA^ 

One then uses the inequality log(Y3^) < which holds for any positive x smaller than one. 
Condition l|66p holds if 

<-\og{l+pTf), (68) 



var(y) - A2 



n 
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whereas Condition ([BT)) is implied by 

jrr. 72 < - log 1 + -^V^ 

As c is smaller than one and is larger than 1, this last inequality holds if 

Gathering conditions fM)) and l(69|) allows to conclude and to obtain the desired lower bound 

□ 

Proof of Proposition [751 The sketch of the proof and the notations are analogous to the one in 
Proposition [H The upper bound l(64|) still holds: 

Eo,^{L,^JY,X)L,^JY,X))<^ ' ^^'^^^ 



2 2 Vvar(y) - |corr(i, j)| A2 ^ 
Using the stationarity of the covariance function, we derive from (|64l) the following upper bound: 

1 1 ^ / var(y) 



2 2p ^ Vvar(y) - A2|corr(0,i)| 

where corr(0,i) equals corr(Xi, Xi+i). As previously, we want to constrain this quantity to be 
smaller than 1 + 77^. In particular, this is implied if for any i between and p — 1: 

var(F) ^" < 1 ' 2pr7^|corr(i,0)| 



var(r)-A2|corr(^,0)|y " ECo |corr(«, 0)| 

Using the inequality log(l + u) < u, it is straightforward to show that this previous inequality holds 
if 

A2 ^ 1 A, 2pry2|corr(0,z)| ^ 



var(y) - A2|corr(i,0)| " n|corr(z,0)| \ J2^Zo |corr(i,0)| / ' 

As |corr(i, 0)| is smaller than one for any i between and p — 1, it follows that Eo,i; (^Lj^^{Y, X) 
is smaller than 1 + 77^ if 

«2 < V I / 2prJ^\corriO,^)\ \ 

- /^\ n|corr(*,0)| [ EtoM^,0)\J' 

We now apply the convexity inequality log(l + ux) > ulog(l + x) which holds for any positive x 
and any u between and 1 to obtain the condition 



.2 . 

^ ^ EfJo Icorr(^,0)| 



< r log 1+ ^.^1"' ...J - (70) 



It turns out we only have to upper bound the sum of |corr(i,0)| for the different types of 
correlation: 
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1. For corr(i, j) = exp[—w\i — j\p), the sum is clearly bounded by 1 + 2 ^t^-^ and Condition 
((70l) simplifies as 

< -log(l + 2p7?2^~^ 



1 + e- 



2. if corr(i, j) = (1 + |« — j\p) * for t strictly larger than one, then X^iLo |corr(i,0)| < 1 + 
and Condition (|70|) simplifies as 



71 V i + 1 

3. if corr(i, j) = (1 + K-jlp)"^ then J^iZo |corr(i,0)| < 1 + 21og(p - 1) and Condition ^ 
simplifies as 

P < - log 1 



- 1 



n "V l + 21og(p-l) 
4. if corr(z, j) = (1 + |i - for < t < 1, then 

P-i „ r 

5^|corr(z,0)|<l + ^ (|) 



i=0 

and Condition (|70|) simplifies as 



2 



1 - i V2 



P^<-log(l+p*2i-*(l-t),7') 



□ 



Proo/ of Proposition flgl For each dimension D between 1 and p, we define rj^ = Pd aj^R"^. Let 
us fix some Z? G {1, . . . Since r^, < a|, and since the Oj's are non increasing, 

for all e 5™„ such that yar(yH|9|p = ""l- Indeed, = Ef=i var(r|X™^._ J - var(r|X™J 
and var(F) — = var(F|X). As a consequence, 

llfll|2 



^ var(y) - II^IP - '^^1 ^ r ^ var(r) - " ""^ 

Since r/j < p£),„, we deduce from Proposition [6] that 

^-(h^°^^ Kar(J)-||.p ^-^})^^- 

The first result of Proposition [13] follows by gathering these lower bounds for all D between 1 and 
P- 

Moreover, pf „ is defined in Proposition [6] as pf „ = V2 ^/L{rj) A ^. \{a + 5 < 47%, it 



is straightforward to show that pf „ > 



log2_ 

□ 
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Proof of Proposition [75l We first need the following Lemma. 

Lemma 17. We consider a partition of 2. For each j ^ J let p{j) ~ \Ij\. For any j G , 

we define &j as the set of 9 W such that their support is included in Ij . For any sequence of 
positive weights kj such that 



it holds that 



if for all j E J , rj < Pp{j),n{v/ \^)! where the function Pp{j).n is defined by fid]) . 

For all j > such that 2^+^ - 1 G X (i.e. for all j < J where J = log(p + l)/log_(2) - 1 ), let 
Sj be the linear span of the e^'s for k g {2-'', . . . , 2-''+^ - 1}. Then, dim(S'j) = 2^ and Sj C S^^ for 
D — D{j) = 2-'+^ — 1. It is straightforward to show that 

J J p 

j=0 j=0 D=l 

where SjiroU)] := e Sj, var(y)'L||gP " ^du)} and Smoiro] [o G 5',„o, var(y)-l|gp " ^d}- 
Let choose J ^ {!,..., J}. For any j e J, we define Ij ^ {2^ 2^ + 1, . . . , 2^+^ - l}. Applying 
Lemma [T7l with kj := [{j + l)R{p)]^^ where R{p) := X^Lo ^/(^ + 1) get 



y l^e^™,, 

\d=i ^ 



m 



var(y)-||0|P 



> S 



if for all those D ^ D{j) 



r^ < ^log(l + 2,yk,) ^1 A ^_ j — . 

For D = D{j), this last quantity is lower bounded by 

It remains to check that (|7T1) is larger than PD{j),n- Using j + 1 — log{D + 1)/ log(2) > log(-D + 1), 
we get 2-'/^ > y/D/2. Thanks to the convexity inequality log(l + ux) > Mlog(l + x), which holds 
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for any a; > and any u e]0, 1], we obtain 

v/log(l + 2j72(j + l)i?(p))2^/2 > /d72 (77v/2i?(^ A l) ^log [1 + log(^ + 1)] 

((77\/2) A l) 0oglog(i? + l)/D72, 

(l A VMlTV)) Vloglog(i^ + 1) 

as -R(p) is larger than one for any p > 1. All in all, we get the lower bound 



> 
> 



/2 

n 



D 



> — == (1 A log(l + 277^)) Vloglog(I? + l)^ = pI 



Thus, if for all 1 < D < p, r|j is smaller than p% it holds that 



var(y) - 116*1 



1 ^ ' D 



□ 



Proof of Lemma [TtI Using a similar approach to the proof of Theorem [TJ we know that for each 
Tj < Pj{ill\/kj) there exists some measure Hj over 



such that 



En 



var(y) - ||6l||^ 



(72) 



We now define a probability measure p. = '^j,=jkjPj over Ujej^il^j]- ^i-h refers to the density 
of P^j- with respect to Pq. Thus, 



and 



EoK(Y,X)]= 5] fc,fc,vEo[L^,(Y,X)L^^,(Y,X) 
Using expression l(56|) . it is straightforward to show that if j ^ /, then 



En 



L^^(Y,X)L..,(Y,X) 



1. 
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This follows from the fact that the sets Qj and 8^' are orthogonal with respect to the inner product 
©. Thus, 



Eo [L^(Y, X)] - 1 + ^ fcf (Eo [lI^ (Y, X)' 



I] <l + V 



□ 



thanks to ((72l) . Using the argument l(53|) as in the proof of Theorem [7] allows to conclude. 

Proof of Proposition\l(A First of all, we only have to consider the case where the covariance matrix 
of X is the identity. If this is not the case, one only has to apply Gram-Schmidt process to X and 
thus obtain a vector X' and a new basis for W which is orthonormal. We refer to the beginning of 
Section El for more details. 

Lik e the previous bounds for ellipsoids, we adapt the approach of Section 6 in Baraud BaraudI 
( 2002f ). We use the same notations as in proof of Proposition [131 Let D*{R) € {1, . . . ,p} an integer 
which achieves the supremum of p'j^ A {R'^ay) = ffj. As in proof of Proposition [131 for any R> 0, 



^^'"-•(«)'var(r)-||0||2 



c\0e SaiR), 



m 



var(y) - ||6'| 



2 - ^D*(,B) 



When R varies, D*{R) describes {1, . . . ,p}. Thus, we obtain 

\\or 



U 

l<D<p 



eS„ 



var(y) - \\9\\ 



^ ri>o ^ 



var(y) - ||6'| 



2 - "^D'iK) 



var(r) 



> r 



and the result follows from proposition [T5l 



□ 



Appendix 

Proof of Proposition H The test associated with Procedure Pi corresponds to a Bonferroni proce- 
dure. Hence, we prove that its size is less than a by arguing as follows: let 6 be an element of Sv 
(defined in Section [2?2|) . 

Pe(r„>0) < ^ Pe(0™(Y,X)-^^i_^^(a„)>o), 

where 0m (Y, X) is defined in The test is rejected if for some model m, (/)m(Y, X) is larger than 
F^^ j^^{arn)- As 9 belongs to Sv, IIvumY-IIvY = nyumC-IIye and Y-IIyumY = e-Uvum^- 
Then, the quantity (/)m(Y,X) is equal to 

0m(Y,X) = |, TTj— . 

Because e is independent of X, the distribution of (/)„i(Y, X) conditionally to X is a Fisher distri- 
bution with Dm and N„i degrees of freedom. As a consequence, (/)„i^Q,^ (Y, X) is a Fisher test with 
Djn and degrees of freedom. It follows that: 

am < OL. 
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The test associated with Procedure P2 has the property to be of size exactly a. More precisely, 
for any 6* G Sy, we have that 

PeiTa, > 0|X) = a X a.s. . 
The result follows from the fact that gx.a satisfies 

and that for any 9 e S'y, IfyumY — Ily Y = Uvum^ ~ live and Y — HyumY = e — Ilyume- 

□ 

Proof of Proposition Let come back to the definitions of and : 

Ti(X,Y) = snp {cly^iY,X)-F^l^Ja/\M\)} 
r2(X,Y) = sup |0,„(Y,X)-i?-i ^^(qx,a)| 

Conditionally on X, the size of is smaller than a, whereas the size is exactly a. As a 
consequence qx.a > a/|A^| as the statistics and differ only through these quantities. Thus, 
r2(X, Y) > r^(X, Y), (X, Y) almost surely and the result ^ follows. □ 
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